Skip to content

fix: increment da height when we have already processed the height previously#2867

Closed
tac0turtle wants to merge 2 commits into
mainfrom
marko/increment_da_height
Closed

fix: increment da height when we have already processed the height previously#2867
tac0turtle wants to merge 2 commits into
mainfrom
marko/increment_da_height

Conversation

@tac0turtle

Copy link
Copy Markdown
Contributor

Overview

When we p2p sync its much faster than da sync making our processed height higher than what is coming form da, this uncovered a bug in which we were not incrementing the processed da height on the node.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @tac0turtle, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug where the Data Availability (DA) height was not being correctly incremented on the node when a block had already been processed via a faster P2P sync. The fix ensures that even if a block is already known, if a DA-sourced event for that block arrives with a higher DA height, the node's state is properly updated to reflect the latest DA height, maintaining data consistency and preventing discrepancies between P2P and DA sync heights.

Highlights

  • DA Height Update Logic: Modified the processHeightEvent function to ensure that the Data Availability (DA) height is correctly updated in the node's state, even if the block corresponding to the DA event has already been processed through another source (e.g., P2P sync). This addresses a bug where the DA height would not increment in such scenarios.
  • New updateStateDAHeight Function: Introduced a new helper function updateStateDAHeight responsible for safely updating the DAHeight in the node's state and persisting it to the store. This function includes checks to prevent unnecessary updates if the new DA height is not greater than the current one.
  • Comprehensive Test Coverage: Added a new dedicated test file syncer_da_test.go with TestProcessHeightEvent_UpdatesDAHeight_WhenAlreadyProcessed to validate the new DA height update logic under scenarios where blocks are processed out of order or from different sources, ensuring the fix works as intended.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions

github-actions Bot commented Nov 20, 2025

Copy link
Copy Markdown
Contributor

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedNov 20, 2025, 3:01 PM

@claude

claude Bot commented Nov 20, 2025

Copy link
Copy Markdown
Contributor

Claude finished @tac0turtle's task —— View job


Code Review Complete

I've reviewed this PR which fixes a bug where DA height wasn't being incremented when blocks were already processed via P2P sync. The implementation is sound overall, but I've identified several areas for improvement.


Critical Issues

1. Race Condition in updateStateDAHeight (syncer.go:782-796)

Severity: High

The updateStateDAHeight function has a potential race condition. Between checking the DA height and updating it, another goroutine could have already updated it:

func (s *Syncer) updateStateDAHeight(daHeight uint64) error {
    currentState := s.GetLastState()  // Load state
    if daHeight <= currentState.DAHeight {
        return nil
    }
    
    currentState.DAHeight = daHeight  // Modify copy
    // Another goroutine could update state here
    
    batch, err := s.store.NewBatch(s.ctx)
    // ... saves potentially stale state
}

Problem: The function loads state, checks it, then saves it back. If another goroutine calls this function concurrently, both could pass the check and the second write could overwrite a higher DA height with a lower one.

Recommendation: Add synchronization (mutex) or use atomic compare-and-swap pattern. Alternatively, document that this function must only be called from the single-threaded processLoop.


2. Missing DA Height Atomic Update (syncer.go:788)

Severity: Medium

When updating the state's DA height, the in-memory atomic s.daHeight is not updated:

currentState.DAHeight = daHeight
// ... save to store
s.SetLastState(state)  // Updates lastState atomic pointer
// But s.daHeight atomic is NOT updated here

Problem: The s.daHeight atomic variable (line 109) is used by the DA worker loop (line 300: daHeight := s.GetDAHeight()). If it's not updated when processing already-seen blocks, there's a potential inconsistency between state.DAHeight and s.daHeight.

Recommendation: Call s.SetDAHeight(daHeight) after successfully saving the state in updateStateDAHeight, similar to what's done in saveState (line 807).


Major Issues

3. Incomplete Test Coverage

Severity: Medium

The test TestProcessHeightEvent_UpdatesDAHeight_WhenAlreadyProcessed only covers one scenario. Missing test cases:

  • Concurrent updates: What happens if multiple DA events arrive with different DA heights for already-processed blocks?
  • DA height regression: What if a DA event arrives with a lower DA height than current state? (Test should verify it's NOT updated)
  • Store failure: What happens if the batch commit fails during updateStateDAHeight?
  • Edge case: DA height == current DA height (should be a no-op)

Recommendation: Add test cases for these scenarios, especially concurrent behavior if the function is meant to be called from multiple goroutines.


4. Error Handling Inconsistency (syncer.go:449-451)

Severity: Medium

When updateStateDAHeight fails, the error is logged but processing continues:

if err := s.updateStateDAHeight(event.DaHeight); err != nil {
    s.logger.Error().Err(err).Msg("failed to update DA height")
}
// Continues execution and returns

Problem: This is inconsistent with other error handling in the codebase. If DA height update fails, the state is now inconsistent with what was expected from DA.

Recommendation: Consider whether this should be a critical error. At minimum, add metrics to track these failures, or potentially send to s.errorCh if it represents a persistent storage issue.


Minor Issues

5. Missing Function Documentation (syncer.go:782)

Severity: Low

The new updateStateDAHeight function lacks documentation explaining:

  • When it should be called
  • Thread-safety guarantees (or lack thereof)
  • Return value semantics (returns nil if no update needed)

Recommendation: Add godoc comment:

// updateStateDAHeight updates the DA height in the state if the new height is greater.
// This is called when a block has already been processed but we receive a DA event
// with a higher DA height, ensuring state consistency with DA layer.
// Returns nil if the new height is not greater than current height.
func (s *Syncer) updateStateDAHeight(daHeight uint64) error {

6. Test Determinism (syncer_da_test.go:32)

Severity: Low (Already noted by Gemini Code Assist)

Using time.Now() in tests can introduce non-determinism:

gen := genesis.Genesis{ChainID: "tchain", InitialHeight: 1, StartTime: time.Now().Add(-time.Second), ProposerAddress: addr}

Recommendation: Use fixed timestamp:

gen := genesis.Genesis{ChainID: "tchain", InitialHeight: 1, StartTime: time.Date(2023, 1, 1, 0, 0, 0, 0, time.UTC), ProposerAddress: addr}

7. Missing saveState Refactoring Opportunity (syncer.go:798-809)

Severity: Low

The new saveState helper function is good, but it was introduced alongside updateStateDAHeight. However, the original trySyncNextBlock function (lines 562-577) still has inline state saving logic that could use this helper:

// In trySyncNextBlock (lines 575-577)
if err := s.saveState(batch, newState); err != nil {
    return fmt.Errorf("failed to save state: %w", err)
}

This is already using saveState, which is great! But I notice the function creates a new batch. This could be optimized.

Recommendation: Consider whether saveState should accept an optional batch parameter to avoid creating multiple batches in trySyncNextBlock where a batch already exists (line 562).


Performance & Concurrency Observations

Thread Safety Analysis

The syncer uses several concurrent workers:

  • processLoop (main event processor)
  • daWorkerLoop (DA fetcher)
  • pendingWorkerLoop (pending event processor)
  • p2pWorkerLoop (P2P syncer)

Current Design: Events are funneled through heightInCh channel to the single-threaded processLoop, which calls processHeightEvent. This is a good design for avoiding most race conditions.

Concern: The new updateStateDAHeight function is only called from processLoop (via processHeightEvent), so it shouldn't have concurrency issues in practice. However, this isn't enforced or documented.


Positive Observations

Good separation of concerns: The new updateStateDAHeight function properly encapsulates the DA height update logic.

Proper error wrapping: Uses fmt.Errorf with %w throughout.

Comprehensive test: The test validates both in-memory and persisted state, which is exactly what's needed.

Backwards compatibility: The fix doesn't change any existing behavior for the normal case where blocks aren't already processed.

Minimal changes: The fix is surgical and doesn't touch unrelated code.


Test Coverage Gap

Codecov reports 41.17% patch coverage with 10 lines missing coverage. Looking at the new code:

Lines likely missing coverage in syncer.go:

  • Error path in updateStateDAHeight when batch creation fails (line 791-793)
  • Error path when UpdateState fails (line 799-801)
  • Error path when Commit fails (line 803-805)
  • The actual error return in saveState (lines 800, 804)

Recommendation: Add a test that mocks store failures to cover error paths.


Summary

The PR successfully fixes the bug where DA height wasn't incremented for already-processed blocks. The core logic is sound and the test validates the fix. However, there are concerns around:

  1. Race condition potential in updateStateDAHeight (though likely not triggered in practice due to single-threaded event processing)
  2. Missing atomic update of s.daHeight after state save
  3. Incomplete error handling for DA height update failures
  4. Test coverage gaps for error paths and edge cases

Recommendations Priority

  1. High: Add s.SetDAHeight(daHeight) in updateStateDAHeight after successful save
  2. High: Document thread-safety assumptions in updateStateDAHeight
  3. Medium: Add test for concurrent/edge cases
  4. Medium: Decide on error handling strategy for updateStateDAHeight failures
  5. Low: Use fixed timestamp in test
  6. Low: Add godoc for new function

Overall, this is a solid fix for a real bug. With the suggested improvements, it will be production-ready. Great work on identifying and fixing this synchronization issue! 🎯


@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where the Data Availability (DA) height was not being updated when a block was processed via P2P sync before being received from the DA layer. The changes introduce logic in processHeightEvent to call a new function, updateStateDAHeight, to persist the correct DA height even for already-processed blocks. A comprehensive test case has been added to validate this specific scenario, ensuring both the in-memory and persisted states are correctly updated. The overall approach is sound and effectively resolves the issue. I have one minor suggestion to improve test determinism.

addr, pub, signer := buildSyncTestSigner(t)

cfg := config.DefaultConfig()
gen := genesis.Genesis{ChainID: "tchain", InitialHeight: 1, StartTime: time.Now().Add(-time.Second), ProposerAddress: addr}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using time.Now() in tests can introduce non-determinism, potentially leading to flaky tests in the future. It is a best practice to use a fixed timestamp to ensure that tests are fully reproducible.

Suggested change
gen := genesis.Genesis{ChainID: "tchain", InitialHeight: 1, StartTime: time.Now().Add(-time.Second), ProposerAddress: addr}
gen := genesis.Genesis{ChainID: "tchain", InitialHeight: 1, StartTime: time.Date(2023, 1, 1, 0, 0, 0, 0, time.UTC), ProposerAddress: addr}

@codecov

codecov Bot commented Nov 20, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 42.85714% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.91%. Comparing base (d8d1709) to head (6c70a51).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
block/internal/syncing/syncer.go 42.85% 6 Missing and 6 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2867   +/-   ##
=======================================
  Coverage   64.90%   64.91%           
=======================================
  Files          81       81           
  Lines        7243     7259   +16     
=======================================
+ Hits         4701     4712   +11     
- Misses       1998     2000    +2     
- Partials      544      547    +3     
Flag Coverage Δ
combined 64.91% <42.85%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tac0turtle tac0turtle closed this Nov 20, 2025
@tac0turtle tac0turtle deleted the marko/increment_da_height branch November 20, 2025 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant